Extracting Information from Conference Announcements: High Recall, High Precision

نویسنده

  • Kevin Cheong
چکیده

Recall, High Precision Kevin Cheong Language Technology Group, Microsoft Research Institute School of MPCE, Macquarie University Sydney NSW 2109, Australia [email protected] Abstract Conference announcements are distributed widely each day via electronic mail to the research and industrial community. These conferences inform researchers, academics and the industry about the research and development (R & D) work performed in a particular eld of interest. There is a wealth of information contained in this multitude of conference announcements. The aim of this research is to extract essential and relevant information from conference announcements and to explore the technologies involved. In this paper we describe an architecture for a system we have developed that extracts relevant and useful information from conference announcement electronic mail messages, with a focus on achieving a high recall and precision rate. We also discuss the extent to which the success of this information extraction task depends on domain and world knowledge.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model

Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...

متن کامل

Applied Text Analytics for Comments on News-Articles A Bachelor Thesis

Several on-line daily newspapers offer readers the opportunity to directly comment on articles. In the Netherlands this feature is used quite often and the quality (grammatically and content-wise) is surprisingly high. The paper develops techniques to collect, store, enrich and analyze these comments. After giving a high-level overview of the Dutch ‘commentosphere’ we zoom in on extracting the ...

متن کامل

Extracting Protein-Protein Interactions from the Literature Using the Hidden Vector State Model

In the field of bioinformatics in solving biological problems, the huge amount of knowledge is often locked in textual documents such as scientific publications. Hence there is an increasing focus on extracting information from this vast amount of scientific literature. In this paper, we present an information extraction system which employs a semantic parser using the Hidden Vector State (HVS)...

متن کامل

Information Extraction for Call for Paper

This paper proposes a system called CFP Manager specialized on IT field and designed to ease the process of searching conference suitable to one’s need. At present, the handling of CFP faces two problems: for emails, the huge quantity of CFP received can be easily skimmed through. For websites, the reviewing of some of the main CFP aggregators available online points out the lack of usable crit...

متن کامل

Literature mining and database annotation of protein phosphorylation using a rule-based system

MOTIVATION A large volume of experimental data on protein phosphorylation is buried in the fast-growing PubMed literature. While of great value, such information is limited in databases owing to the laborious process of literature-based curation. Computational literature mining holds promise to facilitate database curation. RESULTS A rule-based system, RLIMS-P (Rule-based LIterature Mining Sy...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007